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ABSTRACT 



We present progress toward using scanned OSSE observations for mapping and 
sky survey work. To this end, we have developed a technique for detecting pointlike 
sources of unknown number and location, given that they appear in a background 
which is relatively featureless or which can be modeled. The technique, based on the 
newly developed pixon concept and mean field annealing, is described, with sample 
reconstructions of data from the OSSE Virgo Survey. The results demonstrate the 
capability of reconstructing source information without any a priori information about 
the number and/or location of pointlike sources in the field-of-view. 

Subject headings: 
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1. Imaging with OSSE 

The Oriented Scintillation Spectrometer Ex- 
periment (OSSE) aboard the Compton Gamma 
Ray Observatory (GRO) consists of four ac- 
tively shielded NaI(Tl)-CsI(Na) phoswich detec- 
tors |(Johnson et al 1993)| . Collimation is pro- 
vided by tungsten collimators, which define a 
3.8° x 11.4° (FWHM) field of view. Each detector 
is mounted on an independent single-axis point- 
ing system, which allows for instrument point- 
ings which are offset from the spacecraft Z-axis. 
Complex pointing plans for the OSSE instrument 
may be effected via the programmable detector 
orientation system. 

The original functional intent of OSSE was to 
be a "point-and-count" spectrometer, i.e., the de- 
tector would be oriented toward a source of inter- 
est and counts would be accumulated, with offset 
pointings providing background estimates for the 
observation. The ability to program more com- 
plex pointing plans, however, raises the possibil- 
ity of using the non-uniform aperture response 
to our advantage. By scanning the instrument 
in steps smaller than the aperture size, and tak- 
ing scans that overlap in at least two different 
directions, we can (in principle) use the knowl- 
edge of the aperture function and the differential 
flux measured between detectors to distinguish 
features at a substantially better resolution than 
that implied by the aperture size. 

A key goal of OSSE scanned observations is 
to perform sky-survey work, and attempt to de- 
tect previously unknown point sources. A sepa- 
rate but related project is to map the low energy 
galactic 7-ray emission. However, the nature of 
OSSE scanned observations presents some ma- 
jor difficulties to standard data inversion tech- 
niques. Typically, the total number of observa- 
tions is small (0(100)), each with a fairly low 
signal-to-background (~ 0.1%). From this, we 
would like to construct a map of the flux in a 



rather larger set of pixels (0(1000)), where the 
pixels are significantly smaller than the aperture 
size. This is obviously an impossible task for 
"direct" deconvolution or inversion, and while 
model fitting is more feasible, it is also unde- 
sirable due to the bias introduced by selection 
of a particular model. We have thus developed 
a new approach which is highly effective at de- 
tecting point sources in an unbiased manner, i.e., 
with no previous knowledge of their number, lo- 
cation, or strength. This approach begins with 
the problem phrased as one of deconvolution, and 
ultimately transforms it to one of model fitting. 

2. Direct Deconvolution 

OSSE is a linear instrument, in that the de- 
tected signal is a linear function of the source 
intensities (this holds true for many types of in- 
struments). Due to the linear nature of OSSE, 
it is perhaps most "natural" to deal with sta- 
tistical fluctuations (Poisson noise) in the data 
by use of linear least squares (LLSQ) techniques. 
For the purposes of this paper, it will be useful 
to distinguish between two types of LLSQ prob- 
lems, namely model fitting and deconvolution. 
By LLSQ model fitting we mean the adjustment 
of the co-efficients in a given linear model to fit 
a given set of count data (as described, e.g., by 
Wheaton et al. 1995, and references therein). 
An example would be fitting an OSSE data set 
to a model consisting of several point sources at 
known locations, and a linear background model 
with a finite (small) number of terms of known 
form. Deconvolution, on the other hand, refers 
to situations in which the model is not known, or 
perhaps has a known form but a large (in prin- 
ciple infinite) number of terms. An example of 
deconvolution would be fitting the same OSSE 
data for observations containing diffuse sources, 
or with an unknown number of sources at un- 
known positions. 

The essential distinction between the two is 
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that the former problem involves solving a fi- 
nite set of linear equations, whereas the latter 
requires solution of an integral equation, e.g., 

D(y) = J E(y,x)I(x)dx, (1) 

where D is the expected data, / is the image in- 
tensity distribution as a function of spatial coor- 
dinates x, and E(y, x) is a known function, essen- 
tially the OSSE aperture function in our context, 
relating the image space X to the data space Y. 
Solving for I(x) given D(y) and E is a linear in- 
verse problem, of a type common in astronomy. 

Discrete LLSQ model fitting methods may be 
applied to the deconvolution problem if we ap- 
proximate the linear integral equation (|l]) by a 
finite matrix equation: 

D = EI, (2) 

where now Di = D(yi), Eij = E(yi,Xj), and 
Ij = I(xj). Because of the noise (counting statis- 
tics, in the Poisson case), D is never observed 
directly. The measured quantity is the observed 
counts D, which will contain Poisson noise. In 
astronomical imaging applications, we may try a 
huge model representing the cosmic source terms, 
consisting of a source at "every possible loca- 
tion", that is, an array of pixels spaced closely 
enough to be a good approximation to I(x) in 
the integral ([!]). This approach we have previ- 
ously called Direct Linear Algebraic Deconvolu- 
tion (DLAD, Dixon et al. 1993). Wheaton et 
al. 1995 show that the DLAD estimate coincides 
with the Poisson Maximum Likelihood (ML) if 
the pixel array is the same. As DLAD involves 
solving linear rather than non-linear equations, it 
is computationally efficient, a point which shall 
prove crucial in what follows. 

Two difficulties arise at once. First, if the data 
space Y is binned finely to represent the inte- 
gral faithfully, then for the resulting tiny bins 
the expected counts Di = D(yi) in data bin 



i may be small, so that older Poisson LLSQ 
(PLLSQ) model-fitting methods fail due to "in- 
sufficient statistics" (that is, for small numbers 
of counts, the Poisson distribution is poorly ap- 
proximated by a normal distribution). Second, 
when the pixel array Xj is chosen fine enough, 
then E, the discretized matrix representation of 
E(y,x), becomes highly ill-conditioned or singu- 
lar. Effective solutions to the first difficulty have 
been discussed by Wheaton et al. 1995. The 
key point is that the weights for the LLSQ equa- 
tions should be chosen to be uncorrelated with 

the data D itself, effectively ruling out the com- 

i 

monly used approximation Oi « Df (as opposed 
_ i_ 

to ai = Di 2 , which is of course exact). Bear- 
ing this requirement in mind, it is possible to 
construct PLLSQ estimators which are unbiased 
and otherwise convenient for arbitrarily low Di. 
With a weighting matrix W so chosen, the "nor- 
mal equations" (see references in Wheaton et al. 
1995) are 

E T W 2 EI = E T W 2 D, (3) 

which may be solved for I, given D, in the usual 
way. 

The second difficulty has often been summa- 
rized by describing the discretization of the lin- 
ear inverse problem as "ill posed". We distin- 
guish two aspects of the situation: (a) failure 
of the numerical mathematical problem, which 
is essentially an artifact, due to the incapac- 
ity of simpler matrix inversion algorithms when 
faced with ill-conditioned or singular problems 
(especially large problems), and (b) the violent 
anti-correlation which arises among near-by pix- 
els when the pixel size is < angular resolution 
of the instrument. The algorithmic aspect of 
the problem is resolved by recourse to singu- 
lar value decomposition (SVD; see Press et al. 
1986 for a simple discussion) in the computa- 
tion. The anti-correlation among pixels is math- 
ematically inescapable, due to the real confusion 
among nearby points, which the instrument is 
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unable to resolve clearly. The problem can be 
partly removed by making use of the physical 
constraint of non- negativity of the source fluxes, 
as previously discussed in connection with imag- 
ing from COMPTEL data (Dixon et al. 1993, 
and Wheaton et al. 1993, and more generally 
by Dixon et al. 1996). The non- negativity con- 
straint has been enforced by a variant of the 
Non-Negative Least Squares (NNLS) algorithm 
of Lawson and Hanson (1974), and the resulting 
technique is termed Constrained Linear Algebraic 
Deconvolution (CLAD) |(Dixon et al 1996} . 

Figure [j] shows a map, using data from the 
OSSE Virgo Survey, made using the CLAD method 
with a resolution of 2° in the 50-150 keV energy 
band. The irregularly shaped exterior region 
bounds the area of significant exposure during 
the scanned observation. The exposure matrix E 
was calculated from the OSSE aperture response, 
and gives the contribution to each data point 
from each sky pixel. In this observation, the de- 
tectors were stepped at increments of ~ 1.8° for 
individual scans, with the spacecraft orientation 
changed ~ 4° between scans. The two strong 
sources in the FOV, 3C 273 and NGC 4388, ap- 
pear prominently, but there are several other spu- 
rious bright sources, albeit of lower statistical sig- 
nificance. We denote these pixels as "spurious" 
due to their lack of correspondence with any can- 
didate 7-ray source, as well as their marginal sig- 
nificance (3C 279 is in the field, but no significant 
feature is associated with it in the map). An im- 
portant point is that due to the nature of the 
observations, the reconstructed source intensity 
doesn't necessarily have a direct correspondence 
to its statistical significance. Note that M87 is 
within the same pixel as NGC 4388, but it as yet 
unclear if its contribution to the total flux is sig- 
nificant in this energy range. 

While CLAD as illustrated in Figure [l] appears 
to be a useful tool, the spurious sources are, to 
say the least, cosmetically unattractive. That 
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Fig. 1. — Image found generated by solving the 
DLAD equations, subject to the constraint that 
the answer be non- negative. Bright sources in the 
FOV are labelled. Note other bright, spurious 
sources. The image resolution is 2°. 

they are generally non-significant suggests that 
we could still make a map which fits the data well 
even if they were somehow eliminated. A more 
serious problem is that the large number of pixels 
retained in the model seriously degrades the sen- 
sitivity of the maps produced. We are thus led 
to seek an objective means of eliminating from 
the model those pixels which are not statistically 
required by the data. A development based on 
the pixon concept of Puetter and Piha (1993), 
described in the following section, gives a pow- 
erful new image reconstruction algorithm which 
fulfills this need. 

3. Pixon Based Inversion 

In a broad sense, the problems we encountered 
above occurred because we tried to fit too many 
parameters, i.e., more than were statistically jus- 
tified by the data. For example, consider a large 
region in an image that is statistically consistent 
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with being flat. Ideally, we could represent the 
information in this region with a single number, 
such as the average flux. However, since we don't 
necessarily know a priori which regions are flat 
and which aren't, the standard approach is to 
cover the FOV with small pixels, and find the 
flux in each. In the statistically flat region, where 
there is little coherent information spatial infor- 
mation, but lots of noise, the inversion algorithm 
quite happily places flux in many of the small pix- 
els to achieve gains (albeit statistically marginal) 
in the goodness-of-fit (GOF, e.g., x 2 , likelihood, 
etc.). The problem, then, is less with the inver- 
sion algorithm itself as opposed to the choice of 
image basis that we supplied to that algorithm. 

Many attempts to counteract this problem 
concentrate on placing some sort of extra con- 
straint (hard or soft) on the form of the solution, 
and in this way, effectively reducing the number 
of "free" parameters. A prime example of this is 
Maximum Entropy (ME) |(Skilling 1989] . Such 
approaches are effective to some degree, but often 
are limited in their realm of applicability, as well 
as in that they yield biased solutions |(Donoho~et 



al L992) . The pixon approach also attempts to 
reduce the number of free parameters, but does 
so in a more direct and fundamental manner. For 
the purposes of this paper, we shall limit some- 
what the discussion of the conceptual underpin- 
nings of pixons; for more details, the reader is re- 
ferred to (Puetter 1994) . The basic idea behind 



a pixon is that it is a sort of flexible pixel (Puct 



ter |and Piha 1993)| , able to change it's shape and 
size. In the pixon approach, a large flat area in 
the image would not represented by many pixels 
containing the same intensity, but rather would 
be represented by a single pixon, with a single 
number for the intensity and a few others de- 
scribing the shape. An image described by pixons 
would thus require a smaller set of parameters. 

The criterion we use in this approach is essen- 
tially Occam's razor: the simplest model which 



yields an image statistically consistent with the 
data is the correct model. The next question is 
how to go about finding the simplest model. The 
starting point is to define a fundamental model, 
consisting of a grid of pixels at the smallest res- 
olution we expect to be able to see, as well as 
the instrumental response from those pixels. For 
OSSE, this is represented by the exposure matrix 
E, whose columns form the fundamental basis. 
All pixon bases are simply derived as positive lin- 
ear combinations of the fundamental basis func- 
tions. We represent this linear transformation by 
the matrix K, and thus the new model matrix is 
simply E' = EK. Equation then becomes 



K T aKl( p ) = K T f3, 



(4) 



where a = E T W 2 E and [5 = E T W 2 D, and l( p ) 
is the pseudo-image. Equation || may be solved 
for l( p ) subject to the non-negativity constraint 
to yield the constrained least-squares coefficients 
in the pixon basis. To see the actual sky map, 
we need the representation in the fundamental 
basis, for which we simply compute I = Kl( p ). 

The reader at this point may wonder why we 
don't just substitute I = Kl( p ) into eqn. |] and 
solve for I directly. For that matter, a quick bit 
of algebra would seem to indicate that all of the 
K's cancel out when solving for I. However, it is 
to be noted that direct algebraic solution for l( p ) 
gives the unconstrained least-squares solution. 
Enforcement of the non-negativity constraint im- 
plies that some elements of I^ p ) are constrained 
to be zero. Only those columns of E' which will 
yield positive coefficients are used as the basis for 
the fit, which proves to be the key in making the 
pixon approach effective. Let us illustrate with a 
simple problem. We take I to be two-dimensional 
and D to be three-dimensional. E is thus a 3 x 2 
matrix, which we represent as [Ei,E2], where 
Ei and E2 are the 3-vectors forming the funda- 
mental basis. Figure [2] shows the plane spanned 
by the fundamental basis, where Dt rue is the 
true value of D, which we would measure in the 
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limit of infinite statistics. Measurement noise, 
however, will cause D to be found away from 
Dtrue> an d generally contain some component 
perpendicular to the plane. So we perform a 
least-squares fit to find Dls, which is simply the 
orthogonal projection of D onto the plane. As 
Dls lies within the convex region bounded by 
Ei and E2, the components of I will be positive. 
Now we form the pixon basis E' = [Ei, Ei + E2], 
and see that Dls lies outside the convex region 
bounded by this basis. The non- negative least- 
squares estimate D p i xon is found by projection of 
Dls onto the nearest positive subspace (Werner 
19j0)|, which in this case is the vector Ei + E2. 
The pseudo-image is thus I^ p ^ = [0,I 2 P ' ) ], giving 



the image I 



r(p) t(p)i 



which is obviously not 



the same is the image corresponding to Dls- 

The above exercise, while rather simple, does 
show precisely why the pixon approach is supe- 
rior to straightforward constrained ML fitting. 
The pixon basis, as it is formed of positive lin- 
ear combinations of fundamental basis functions, 
will always bound a smaller convex region than 
the fundamental basis. As shown in Figure [||, 
if this reduced region contains Dt rue) then the 
estimate D p j xon is likely to be closer to D trU e 
than the ML estimate. Occam's razor has served 
us well: the fundamental model required two ba- 
sis functions to perform the fit, but due to the 
constraint, the pixon model required only one, 
and yielded a superior estimate. In the context 
of image reconstruction, Occam's razor implies 
that the optimal pixon basis will allow only that 
level of detail that is statistically justified by the 
data. Areas in the image which are statistically 
smooth should be constrained to be smooth in 
the fit, thus reducing the number of free param- 
eters. As a result, spurious sources are largely 
done away with, and correspondingly, the flux 
estimates and sensitivity for actual sources are 
improved. 




Fig. 2. — A schematic of the effect of the pixon 
prior. Dt rue is the "noise-free data" , i.e. what we 
would measure in the limit of infinite statistics. 
E\ and E2 represent fundamental basis functions 
for the instrument, while {E\ + E<i)j1 is a basis 
function for a pixon basis. Dls is the constrained 
least-squares estimate for the fundamental ba- 
sis, while Dpi xon is the constrained least-squares 
estimate for the pixon basis. Note that D p i Xon 
requires only one basis function as opposed to 
two for Dls, an d further that D p i xon is closer to 

Dfrue ■ 



4. Finding the optimal pixon basis 

While the advantages of using the optimal 
pixon basis for image reconstruction are obvious, 
what is not obvious is how one goes about find- 
ing it. The set of possible basis functions consists 
of all positive linear combinations of the funda- 
mental basis functions. Further, there exists no 
a priori systematic way of ordering the possible 
ways of choosing a basis, i.e., the only way com- 
pare the merits of two different bases is to actu- 
ally perform the fit for each one. We must find 
some way of systematizing the search procedure. 
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The first step is to choose some set of poten- 
tial pixon shapes. This eliminates the possibility 
of being absolutely optimal, but greatly shrinks 
the set of possible basis functions. Following 
Puetter and Piha |(Puetter and Piha 1993) , we 
have implemented a fuzzy pixon basis. Here, the 
pixon boundaries are not sharply defined, and 
the pixons are allowed to overlap. For simplicity, 
we shall use pixons which are circularly symmet- 
ric, and thus describable by a single parameter. 
For the pixon shape we have chosen, there will be 
associated with each pixel in the image a length 
Sj, called the pixon scale, which defines the size 
of the pixon. The pixon shape function we have 
chosen is an inverted paraboloid (Puetter 1994) 



and thus the pixon transform matrix K is defined 
as 



jk 



V 




djk < 5 
djk > 8 



K 



Fjk 
V ' 



(5) 



where djk is the distance between image pixel j 
and pseudo-image pixel k, and we have a one- 
to-one correspondence between the image and 
pseudo-image pixels. Further, we restrict 5j to 
some finite set of values. The implementation 
used for this paper utilizes two values, a small 
one <JW (usually one pixel) for pointlike sources, 
and a large one 5^ (roughly the size of the image 
space) for modeling diffuse background emission. 

Given the above restrictions on K, we now 
have a well-defined set of potential basis func- 
tions. However, this set is still forbiddingly large. 
With two pixon scales and an image space of di- 
mension J, the number of possible bases is 2 J , 
thus a brute force search is infeasible for all but 
very small problems. The problem of finding an 
optimal pixon basis is thus one of combinatorial 
(rather than functional) minimization. Practical 
techniques for attacking such problems are gener- 



ally heuristic in nature, and promise only a near- 
optimal solution. For OSSE sky survey work, 
we anticipate looking for point sources within 
an otherwise roughly uniform background, im- 
plying that we really only need two pixon sizes, 
one small (for point sources) and one large (for 
the background). The choice of two disparate 
types of basis functions leads naturally to an op- 
timization heuristic called mean field annealing 
(MFA) |(Reeves 1993) . Annealing algorithms de- 
rive from the statistical mechanical analog of an- 
nealing a material from an initially disordered 
state, where one begins at a high temperature 
and slowly cools the material. Consider the case 
of a ferromagnetic material. In the presence of 
a uniform external field, each atom in the ma- 
terial can have one of two possible spins (up or 
down), and the configuration space of the mag- 
net is simply all possible combinations of ups 
and downs. The configuration corresponding to 
a defect-free magnet occurs at the global mini- 
mum of the Hamiltonian, where all of the spins 
point in the same direction. However, the Hamil- 
tonian also has many local minima, correspond- 
ing to the formation of domains, where groups 
of spins in different regions of the magnet point 
in different directions. To anneal a magnet, one 
begins at a temperature high enough that kT 
greatly outweighs the interaction energy. At high 
temperature, the spins will be oriented randomly, 
and on average half will be up and half will be 
down. If the system is cooled too quickly, it 
"quenches", becoming trapped in a local mini- 
mum. Slow cooling, however, allows the system 
to "jump out" of local minima via thermal fluc- 
tuations. By lowering the temperature in steps, 
and allowing equilibrium to be reached at each 
step, we can arrive at a nearly defect-free mag- 
net in the low temperature limit. 

The problem of finding the pixon basis is sim- 
ilar. Here, the configuration space is all possi- 
ble assignments of pixon sizes to the pixels of 
the fundamental basis. With two possible pixon 
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sizes, the problem is closely analogous to that 
of the magnet, where we tried to assign one of 
two possible values of spin. The Hamiltonian we 
choose consists of two parts: the \ 2 ■, and a func- 
tion J2j Sj j where Sj is a linear function of Sj , and 
is taken to be 1 at the smallest value of Sj, and 
for the largest value. This second function acts 
as a penalty against complexity, utilizing the fact 
that smaller pixons (more detail) generally imply 
more parameters in the model. The particular 
functional relationship of Sj to Sj isn't so impor- 
tant, for as we'll see below, it is only changes 



in is, that will be relevant. We then write the 



following Hamiltonian for the problem: 



(6) 



with x 2 evaluated for the result of a fit using a 
particular set of pixon scales 5. This Hamiltonian 
represents a balance between finding the simplest 
model (all big pixons) and satisfying the \ 2 -> with 
the constant 7 weighting the relative importance 
of the two terms. We then define a parameter T 
analogous to temperature, and write the Boltz- 
mann distribution 



P(S,T) 



-H(6)/T 

~~z ' 



(7) 



where Z is the partition function, Z = J2s e~ H ^' T , 
with the sum performed over all possible config- 
urations 5. 



Our ultimate goal is to find the average val- 
ues < 5 > at a given temperature for the spec- 
ified Hamiltonian and Boltzmann distribution. 
Clearly, this calculation cannot be carried out 
exactly for an interesting number of pixels, as it 
requires calculating eqn. |7] over 2 J configurations. 
We note that for T constant, the equilibrium en- 
ergy is equal to the average energy. Define H^ 
as the average energies for states with Sj held 
fixed at one of the two allowed values 5®, i.e., 
Hij =< H(S) > \g S (i). We then approximate 
the average value of Sj at equilibrium for a given 



<5j> = 



temperature eqn. ^ as 

g) expj-Hy/T) + S^ exp(-H 2j /T) 

expi-Hy/T) + exp(-H 2j /T) 
6 W + g) exp [{Hlj _ H 2j)/T ] 

1 + exp [(Hij - H 2j )/T] ' [ ' 

where the mean or effective field at j is $j = 
Hij — H 2 j. This seems easy enough. However, 
calculation of H^ still requires summation over 
all possible configurations. We thus make the ad- 
ditional approximation (Van den Bout and Millci 
1990)| , that < H(5) >~ H(< 5 >), i.e., the av- 



erage value of the energy is equal to the Hamil- 
tonian evaluated at the average values of 5 (for 
the reader who finds this procedure a bit ad 
hoc and suspicious, we shall investigate it more 
deeply in the following section). At each step in 
the algorithm, then, < Sj > is calculated from 



H(< 5 >) 



6; =8(1) 



, with the values in < 5 > being 



taken from the previous step, leading to the MFA 



algorithm (Van den Bout and Miller 1990) : 



1. T = Tq = initial temperature 

2. do until saturation 

(a) do until equilibrium 

i. choose a random pixel j 

ii. set Sj = 

iii. find the CLAD solution for this 
basis 

iv. Compute H\j for that solution 

v. set Sj = 5 (2) 

vi. find the CLAD solution for this 
basis 

vii. Compute H 2 j for that solution 

viii. compute < Sj > from eqn. ^ 

(b) T = cT 

"Equilibrium" is defined as the point where the 
Sj 's are no longer changing significantly at a par- 
ticular temperature (in practice, this seems to 
require only one or two iterations). "Saturation" 
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is the point where the Sj's all take on one of 

the possible predefined values, in this case, 

or 5^ 2 \ The constant c represents the cooling 



easy to perform (Gill, Murray, and Wright 1991) . 
The original Lawson and Hanson implementation 



rat |s, and must be chosen carefully. If it is too 



small, the system quenches, and the low tem- 
perature limit produces a solution that is in a 
non-optimal local minimum. If c is too large, 
the algorithm takes too long to complete. Nomi- 
nally, the initial temperature may be chosen such 
that the system is initially maximally disordered, 
i.e., 5j = (S^ + S^)/2. In practice, one can 
choose a somewhat smaller temperature, where 
some degree of structure (though not too much) 
is present in the Sj's. The constant 7 is also 
of importance, as overweighting of the x 2 gives 
spurious sources, and underweighting suppresses 
weak sources. Note also from eqn. || that we only 
use the difference between H\j and R-2j- Since we 
modify only the Sj for a particular j while hold- 
ing the others constant, H\j — B.2j = &-X 2 ~ ^ s j- 
Given the definition of Sj, Asj is simply 1. 



We note that this algorithm requires a good 
deal of solving of the least squares problem for 
different bases, which may seem to impose a large 
computational burden. However, the variant of 
NNLS we have developed mitigates this to a great 
extent. Partially, this is because we are solving 
the least squares problem within a linear alge- 
braic context, which is much more computation- 
ally efficient than non-linear minimization. An- 
other contributing factor is our variant of NNLS, 
which takes advantage of the fact that only some 
subset of the parameters is unconstrained, and 
uses a QR factorization updating scheme rather 
than re-solving the entire problem when only a 
single Sj is changed. This algorithm will be de- 
scribed in a subsequent paper; a brief description 
is given here. The Q-R-decomposition algorithm 
factors a matrix into an orthonormal part (Q) 
and an upper triangular part (R) KGill, Murray,! 
an d Wright 199lj| . A useful property of QR fac- 
torization is that rank one updates (e.g., addition 
of a column to the original matrix) are relatively 



of NNLS takes advantage of this (Lawson and 
Hanson 1974)| . Within the pixon framework, ad- 



dition/removal of a particular pixon to/from the 
active set is also a rank one update. Further, this 
action generally doesn't modify the constraint set 
a great deal. By using the results from a previous 
NNLS step, and applying the Q-R-updating, we 
generally save a good deal of time over redoing 
the entire NNLS procedure. The result is that for 
J pixels, A" possible pixon sizes, L temperature 
steps, and an average of M unconstrained param- 
eters, the algorithm performs like 0(JLNM 2 ). 



5. Further discussion of the algorithm 

In this section, we present a more detailed look 
at the inner workings of the mean field annealing 
algorithm described above. The "proof is in the 
pudding" reader may wish to skip this section on 
first reading, and move directly to the section on 
the application of the algorithm to OSSE data. 

The above derivation of the mean field equa- 



tions (following that in (Van den Bout and Miller 
1990)1) , while based on seemingly reasonable as- 
sumptions, does not make much rigorous contact 
with statistical mechanics. To make this connec- 
tion clear, we will present the derivation given 
in (Reeves 1993), which is a standard mean field 
approximation procedure. We begin with a set 
of J "spins", s, which are allowed to take on the 
values and 1. The interactions between these 
spins is described by a Hamiltonian or energy 
function, H(s). Correspondingly, for a system 
which is Boltzmann distributed at temperature 
T, the partition function Z is given by 



-H{s)/T 



(9) 



where the sum is taken over all possible configu- 
rations of s. Next, we replace s with continuous 
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variables v: 



Z = Y,j d[v)e- H ^l T \[5{s 3 -v 3 ), (10) 

W 3 

which, via the presence of the 5-function, is the 
equal to the previous expression. The function 
H(v) now represents an effective energy in terms 
of the new variables. We then Fourier expand the 
(5-functions in conjugate variables u, obtaining 

Z = Y^J d[v] J d[u]e- H ^/ T l[e u ^- v ^, 



(11) 



and carry out the sum over s: 



(12) 

We have thus rewritten the partition function 
entirely in terms of the new variables, and as 
yet have made no approximation. Next, the sad- 
dlepoint approximation is made, which approxi- 
mates the value of the integrand by it's maximum 
value. The point where the integrand takes on its 
maximum value is given by 



u j 



1 dH(v) 
1 



1 + e u i 



(13) 



Combining these gives the implicit equation for 

(14) 



1 8H(v) \ —1 

v , = I 1 + e T dv i 



average of Sj, evaluated in the mean field ap- 
proximation. The Hamiltonian in our applica- 
tion, however, is not so simple. In this case, the 



exponent in eqn. 14 is given by 



7^ — + 1, 

dvj 



(15) 



where the functional dependence of y 2 on v is 



+ 8( 2 \ 



given via the relation 5j -u-'^-u^tu--",. 
Given that we evaluate y 2 after solving the con- 
strained least squares problem in the basis asso- 
ciated with 5, it is not at all clear that one can 
obtain a functional form for its partial deriva- 
tives. Even if we were to make the reasonable 
assumption that small perturbations in 5 do not 
change the constraint set, the expression for the 
partial derivative is quite complicated, and dif- 
ficult to evaluate numerically. An obvious rem- 
edy is to make a numerical approximation of the 
derivative, 

A X 2 



with Av* 



X 2 ) 



dx 

dVj 



Ava 



(16) 



and Ax 2 the difference in 



y 2 values evaluated at vP and . Substituting 



the approximate derivative into eqn. 14, setting 



Avj = 1, and doing a little algebra, we obtain 



-H 2j 



-Hi 



J + e" 



-H 2j ' 



(17) 



In this expression, the interpretation of Vj as the 
thermal average of sj within the above approx- 
imations, is obvious, and as such, leads directly 
to eqn. @. 



The question immediately arises as to the in- 
terpretation of the new variables v, for at first 



glance, eqn. 14 bears little resemblance to some- 



thi ng like eqn. — For many optimization prob- 



lems, snrii as graph partitioning |(Van den Borrt 



anc |i Miller 1990") , the form of H is such that 



dH/dvj = the mean field at spin j. In this 



case, Vj m eqn. 



14 takes the form of a thermal 



Another question which arises is whether or 
not the mean field approximation is even valid for 
the problem at hand. Methods exist in statistical 
mechanics for deciding this |(Pfeuty and Toulouse 



1977) , but are difficult to apply in a general man- 



ner, given the nature of H in our problem. Gen- 
erally speaking, one expects the approximation 
to be reasonable in the case where the number of 
"spins" is large, so that individual interactions 
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are small compared to the mean field, and/or 
when the dimensionality of the system is large, 
such that each "spin" has many neighbors. For 
the image reconstruction problem, we would ex- 
pect the first condition to be satisfied when the 
FOV is a fair amount larger than the extent of 
the instrument response, such that there exist 
many groups of pixels which are effectively de- 
coupled. The second condition is more subtle. 
The dimensionality of our system is nominally 
2, but, when we are attempting to superresolve 
the image, i.e., the pixel size is smaller than the 
extent of the instrument response, the "forces" 
in the problem are effectively long range, as spa- 
tially separated pixels will be correlated via the 
response. For long range forces, the effective di- 
mension of the system is larger than the geomet- 
ric dimension (Pfeuty and Toulouse 1977) . The 



for Vj is that of the algorithm described above, 
which in terms of eqn. 14 is just the iteration 



precise calculation of this number for the image 
reconstruction problem is difficult, and problem 
dependent. Regardless, it should be clear that 
for the case of small pixels, each "spin" will feel 
the influence of many nearby "spins" . 

Now, we have obviously made just about the 
coarsest approximation possible for dH/dvj, but 
there are some decided benefits to the particular 
choice we have made. First, as we saw above, it 
allows us to recover the interpretation of v as an 
approximate thermal average. Closely related is 
the property that for the approximation we have 
chosen, as T — > 0, Vj (and thus < Sj >) takes on 
one of its two extremal values, which was really 
our goal in the first place. Due to the compli- 
cated nature of H(y), this won't necessarily oc- 
cur for the exact solution of eqn. |14| in the low T 
limit. This property is of some importance in sky 
survey work, where we don't really want sources 
showing up as different sized pixons depending 
on the statistics of the dataset. 

The final benefit has to do with the method of 
solving for vj. As eqn. 14 is implicit in Vj, direct 
solution is impossible. A simple way of solving 



(n+l) 



1 + e 



1 " (n) 



f(v (n) )- (18) 



The solution of eqn. [L4| is represented by a fixed 
point (of which there is likely more than one) of 
the mapping defined in eqn. 18. Now, as f(v^ n ') 
of eqn. 18 necessarily lies between and 1, we 



are guaranteed the existence of a fixed point. 
We are not, however, guaranteed of convergence 
to that fixed point. The stability of the fixed 
point is basically determined by the eigenval- 
ues of the Jacobian of the map in the neighbor- 
hood of the fixed point (actually, the analysis 
is somewhat more complicated in the serial up- 
dating scheme which we are using, but the fla- 
vor is the same (Reeves 1993)| ). If the eigenval- 
ues Aj are such that —1 < Aj < 1, the fixed 
point will be stable, and we will have conver- 
gence. If not, other undesirable behavior will oc- 
cur. Though a full eigenvalue analysis is clearly 
out of the question, numerical experiments have 
shown that using "better" approximations for 
dH/dvj will often result in behavior which is os- 
cillatory, or even chaotic. In our coarse approx- 
imation, however, Avj = 1, and < f{v) < 1, 

so dv ^ n+1 ^ /dv^ < 1. Though we don't prove 
that this implies stability of the fixed point, we 
do note that the nature of the problem is such 
that the Jacobian will likely be diagonally domi- 
nated (i.e., pixons which are well separated have 
little effect on each other w.r.t. v or S), and so we 
would expect our coarse approximation to have 
some stabilizing effect; and in fact, in all of our 
applications of the algorithm, convergence has al- 
ways been achieved using this approximation. 

Though we have argued for the proposed al- 
gorithm's effectiveness in the case of two pixon 
sizes, it should be fairly clear that it won't be 
very effective for three or more. In the above, we 
see that H(v) is essentially replaced by a linear 
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approximation, which is fine for capturing the av- 
erage behavior between two extremal pixon sizes. 
However, with an intermediate pixon size, one 
expects that, in some cases, H will have a signif- 
icant minimum in between the two extremal val- 
ues, and the linear approximation fails to capture 
this. Thus the extension of the MFA algorithm 



to > 2 spin states as suggested in (Van den Bout 



an d Miller 1990) is not appropriate for the prob- 
lem at hand. We suggest that > 2 pixon sizes 
might be accomodated within the MFA frame- 
work via a higher order approximation for H, 
but leave this as an avenue for future research. 



6. Application to OSSE data 

To test the effectiveness of our MFA pixon al- 
gorithm, we applied it to the same dataset used 
to compute Figure |l]. For this purpose, we set 
the initial temperature To = 3., 7 = 0.01, and 
c = 0.3. The computation required approxi- 
mately 15 minutes of CPU time on a VAXsta- 
tion model 4000-90 to compute the image for 209 
pixels. The result is shown in Figure |3], and is 
obviously far superior to the constrained least 
squares result. Most notably, none of the spu- 
rious sources seen in Figure |l| are found by the 
MFA algorithm, and the two expected sources 
stand out clearly. An apparent broad gradient 
is also noticeable; this is a known background 
effect for OSSE scanned observations, which we 
have intentionally not removed to show the util- 
ity of the pixon approach. For comparison with 
a more established technique, Figure |] shows a 
Maximum Entropy reconstruction of the same 
dataset. The suppression of spurious sources by 
the MFA pixon algorithm is not only aestheti- 
cally pleasing, but also implies that more confi- 
dence can be placed in the flux estimates of the 
real sources. Another measure of the success of 
the MFA algorithm is the number of basis func- 
tions required by the fit. The CLAD fit, using 
the NNLS algorithm, required 56 basis vectors. 
The pixon image required only 4. It should be 
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Fig. 3. — Image generated using the MFA pixon 
algorithm to find a near-optimal pixon basis. 
Note that the strong sources stand out, but that 
the spurious peaks common in Figure 1 have been 
suppressed. The image resolution is 2°. 

emphasized that the image in Figure ||| was gener- 
ated without any a priori bias given to those pix- 
els containing real sources. All pixels are treated 
equally by our algorithm, with no assumptions 
being made about source locations. Further re- 
sults for the Virgo survey are presented in a sep- 
arate paper KKurfess et al 1997H - 



7. Future directions 

The MFA pixon algorithm proves to be re- 
markably successful, despite certain limitations 
which will be addressed by future research. Most 
notably, there currently exists no rigorous way to 
estimate the parameters To, c, and 7. For appli- 
cation to OSSE data, we have made a heuris- 
tic estimate of 7 = 1 /Ax 2 (3<r), where Ax 2 (3cr) 
is the change in the \ 2 required to produce a 
3a change w.r.t. the x 2 distribution where the 
number of degrees of freedom is calculated from 
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Fig. 4. — The same data as in Figure 3 re- 
constructed with Maximum Entropy. Apparent 
sources other than those which are known to oc- 
cupy the region are of marginal significance, and 
we attribute them to the scan dependent back- 
ground effect, which appears as a broad gradient 
in Figure 3. 

the number of data points. The idea is that any 
addition of further detail to the model which pro- 
duces less than a 3a improvement to the x 2 win 
be outweighed by the pixon penalty of Asj = 1. 
From there, we may further refine our estimate 
of 7 by examination of the final x 2 value. If the 
X 2 falls outside the desired confidence interval, 7 
should be lowered to admit more parameters in 
the final model. The other two parameters, To 
and c were essentially guessed. One would like 
to minimize To, so as not to spend time comput- 
ing iterations which produce no structure in the 
image. By examining results from the first itera- 
tion, one can tell if To was chosen too large or too 
small. If too large, the Sj's will show no struc- 
ture, all having essentially the same value. If too 
small, some of the Sj's will saturate to one of the 
predefined values. Somewhere in between these 
two extremes lies the correct To, where some (but 
not too much) structure is apparent, but at this 
point, judgements of how much structure is ap- 
propriate are basically subjective. Similarly, a 
small value of c will hasten the convergence of 
the algorithm, but a value too small will lead to 



a solution in a non-optimal local minimum. For 
both To and c, it is best to err on the side of cau- 
tion and overestimate their values. This leads 
to a somewhat larger computational burden, but 
also increases confidence that the near-global op- 
timum will be found. 

Despite this seemingly large amount of arbi- 
trariness, it turns out that the results from the 
MFA pixon algorithm are essentially constant 
over a large range of parameters. Experience has 
shown that even quite bad estimates of To, c, or 
7 still lead to excellent results. We have changed 
the weighting factor 7 over as much as two orders 
of magnitude, and at worst have seen only a few 
low significance spurious sources, which still rep- 
resents a vast improvement over previous meth- 
ods. Local minima obtained when To or c are 
underestimated generally show sources shifted by 
one or two pixels, but statistically important de- 
tail is never omitted completely (provided 7 isn't 
horribly underestimated) . 

8. Conclusions 

The ability to create skymaps using data from 
OSSE scanned observations has been demon- 
strated. For this purpose, we have generated a 
new image reconstruction algorithm which uti- 
lizes the concept of the pixon. The driving force 
behind the pixon concept is simply Occam's ra- 
zor, where we attempt to find the simplest pos- 
sible basis for the reconstruction that yields an 
acceptable value for the x 2 ■ As the problem 
of finding such a basis is combinatoric in na- 
ture, we have employed an approximate tech- 
nique, mean field annealing, which gives a near 
optimal solution with reasonable computational 
requirements. The MFA pixon algorithm has 
been successfully applied to scanned data from 
the OSSE Virgo Survey, producing skymaps at 
resolution substantially smaller than the size of 
the OSSE aperture, while suppressing spurious 
sources common in pure maximum likelihood re- 
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constructions. Further, our algorithm has been 
shown to find the nearly minimum basis for the 
reconstruction, implying smaller errors on source 
flux estimates. The algorithm is most effective 
for the case where the source distribution consists 
of some number of highly localized sources in an 
otherwise slowly varying "background". We see 
that the MFA pixon algorithm occupies a place 
somewhere between standard deconvolution al- 
gorithms and model fitting algorithms, as it es- 
sentially transforms the former problem into the 
latter. 

One final note: the algorithm we have de- 
scribed has a somewhat different flavor than com- 
monly used regularized inversion schemes. Tech- 
niques such as Maximum Entropy estimate the 
image by finding the minimum of the sum of 
two (or more) functions of the image parameters. 
One of these functions specifies the goodness-of- 
fit, while the other tends to have the property 
of suppressing oscillatory behavior in the solu- 
tion. The MFA pixon algorithm, however, clearly 
is not estimating the image, but rather 5. In 
this case, the x 2 plays the role of a constraint, 
with 7 acting as a Lagrange multiplier. From 
these 5, the image is obtained via a simple con- 
strained least squares estimate. So, despite the 
fact that it appears that we've reduced the num- 
ber of parameters in the problem, we've actually 
just changed the parameter set. The effectiveness 
of the technique is at least due in part to the fact 
that between the \ 2 an d the non-negativity re- 
quirement on the image, that the total number of 
constraints is large, and the problem thus highly 
overdetermined . 
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